Overview

Dataset statistics

Number of variables21
Number of observations5318
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory872.6 KiB
Average record size in memory168.0 B

Variable types

NUM21

Reproduction

Analysis started2020-07-19 18:07:05.625354
Analysis finished2020-07-19 18:08:47.818735
Duration1 minute and 42.19 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

std_fixed_acidity is highly correlated with fixed acidityHigh correlation
fixed acidity is highly correlated with std_fixed_acidityHigh correlation
std_volatile_acidity is highly correlated with volatile acidityHigh correlation
volatile acidity is highly correlated with std_volatile_acidityHigh correlation
std_citric_acid is highly correlated with citric acidHigh correlation
citric acid is highly correlated with std_citric_acidHigh correlation
std_residual_sugar is highly correlated with residual sugarHigh correlation
residual sugar is highly correlated with std_residual_sugarHigh correlation
std_free_sulfur_dioxide is highly correlated with free sulfur dioxideHigh correlation
free sulfur dioxide is highly correlated with std_free_sulfur_dioxideHigh correlation
std_alcohol is highly correlated with alcoholHigh correlation
alcohol is highly correlated with std_alcoholHigh correlation
std_density is highly correlated with densityHigh correlation
density is highly correlated with std_densityHigh correlation
df_index has unique values Unique
citric acid has 136 (2.6%) zeros Zeros
std_citric_acid has 136 (2.6%) zeros Zeros
std_residual_sugar has 77 (1.4%) zeros Zeros

Variables

df_index
Real number (ℝ≥0)

UNIQUE

Distinct count5318
Unique (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3190.1861602106055
Minimum0
Maximum6496
Zeros1
Zeros (%)< 0.1%
Memory size41.5 KiB

Quantile statistics

Minimum0
5-th percentile304.85
Q11566.5
median3138.5
Q34806.75
95-th percentile6186.15
Maximum6496
Range6496
Interquartile range (IQR)3240.25

Descriptive statistics

Standard deviation1882.325731
Coefficient of variation (CV)0.590036329
Kurtosis-1.188406186
Mean3190.18616
Median Absolute Deviation (MAD)1619
Skewness0.05131095624
Sum16965410
Variance3543150.156
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
40941< 0.1%
 
26281< 0.1%
 
46831< 0.1%
 
26361< 0.1%
 
5891< 0.1%
 
46871< 0.1%
 
26401< 0.1%
 
5931< 0.1%
 
46911< 0.1%
 
26441< 0.1%
 
Other values (5308)530899.8%
 
ValueCountFrequency (%) 
01< 0.1%
 
11< 0.1%
 
21< 0.1%
 
31< 0.1%
 
51< 0.1%
 
ValueCountFrequency (%) 
64961< 0.1%
 
64951< 0.1%
 
64941< 0.1%
 
64931< 0.1%
 
64921< 0.1%
 

fixed acidity
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count106
Unique (%)2.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.282274607678895
Minimum0.0
Maximum1.0
Zeros1
Zeros (%)< 0.1%
Memory size41.5 KiB

Quantile statistics

Minimum0
5-th percentile0.1487603306
Q10.2148760331
median0.2644628099
Q30.3223140496
95-th percentile0.4958677686
Maximum1
Range1
Interquartile range (IQR)0.1074380165

Descriptive statistics

Standard deviation0.1090724514
Coefficient of variation (CV)0.3864054663
Kurtosis4.587585048
Mean0.2822746077
Median Absolute Deviation (MAD)0.04958677686
Skewness1.65005523
Sum1501.136364
Variance0.01189679966
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.24793388432795.2%
 
0.23140495872695.1%
 
0.21487603312464.6%
 
0.25619834712254.2%
 
0.26446280992234.2%
 
0.23966942152114.0%
 
0.28099173552013.8%
 
0.27272727272003.8%
 
0.22314049591973.7%
 
0.19834710741773.3%
 
Other values (96)309058.1%
 
ValueCountFrequency (%) 
01< 0.1%
 
0.008264462811< 0.1%
 
0.033057851242< 0.1%
 
0.0495867768630.1%
 
0.057851239671< 0.1%
 
ValueCountFrequency (%) 
11< 0.1%
 
0.97520661162< 0.1%
 
0.96694214881< 0.1%
 
0.92561983471< 0.1%
 
0.8677685951< 0.1%
 

volatile acidity
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count187
Unique (%)3.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.1761138272533534
Minimum0.0
Maximum1.0
Zeros2
Zeros (%)< 0.1%
Memory size41.5 KiB

Quantile statistics

Minimum0
5-th percentile0.05333333333
Q10.1
median0.1466666667
Q30.22
95-th percentile0.4
Maximum1
Range1
Interquartile range (IQR)0.12

Descriptive statistics

Standard deviation0.1121762143
Coefficient of variation (CV)0.6369529075
Kurtosis2.861537205
Mean0.1761138273
Median Absolute Deviation (MAD)0.05333333333
Skewness1.504115403
Sum936.5733333
Variance0.01258350306
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.13333333332314.3%
 
0.122194.1%
 
0.10666666672194.1%
 
0.12666666671883.5%
 
0.11333333331863.5%
 
0.093333333331833.4%
 
0.081783.3%
 
0.11783.3%
 
0.14666666671683.2%
 
0.161643.1%
 
Other values (177)340464.0%
 
ValueCountFrequency (%) 
02< 0.1%
 
0.0033333333331< 0.1%
 
0.0066666666671< 0.1%
 
0.0133333333360.1%
 
0.0166666666740.1%
 
ValueCountFrequency (%) 
11< 0.1%
 
0.83333333332< 0.1%
 
0.77333333331< 0.1%
 
0.73666666671< 0.1%
 
0.73333333331< 0.1%
 

citric acid
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct count89
Unique (%)1.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.1918807233446762
Minimum0.0
Maximum1.0
Zeros136
Zeros (%)2.6%
Memory size41.5 KiB

Quantile statistics

Minimum0
5-th percentile0.02409638554
Q10.1445783133
median0.186746988
Q30.2409638554
95-th percentile0.3373493976
Maximum1
Range1
Interquartile range (IQR)0.09638554217

Descriptive statistics

Standard deviation0.0886605652
Coefficient of variation (CV)0.4620608243
Kurtosis2.581238061
Mean0.1918807233
Median Absolute Deviation (MAD)0.0421686747
Skewness0.4838167655
Sum1020.421687
Variance0.007860695822
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.18072289162645.0%
 
0.19277108432404.5%
 
0.16867469882354.4%
 
0.29518072292324.4%
 
0.20481927712033.8%
 
0.1566265062033.8%
 
0.17469879521983.7%
 
0.1867469881883.5%
 
0.14457831331843.5%
 
0.16265060241793.4%
 
Other values (79)319260.0%
 
ValueCountFrequency (%) 
01362.6%
 
0.006024096386310.6%
 
0.01204819277440.8%
 
0.01807228916260.5%
 
0.02409638554340.6%
 
ValueCountFrequency (%) 
11< 0.1%
 
0.74096385541< 0.1%
 
0.602409638660.1%
 
0.59638554221< 0.1%
 
0.54819277111< 0.1%
 

residual sugar
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count316
Unique (%)5.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.06824547721939841
Minimum0.0
Maximum1.0
Zeros1
Zeros (%)< 0.1%
Memory size41.5 KiB

Quantile statistics

Minimum0
5-th percentile0.007668711656
Q10.01840490798
median0.03220858896
Q30.1058282209
95-th percentile0.2116564417
Maximum1
Range1
Interquartile range (IQR)0.08742331288

Descriptive statistics

Standard deviation0.06902830452
Coefficient of variation (CV)1.011470757
Kurtosis7.023430249
Mean0.06824547722
Median Absolute Deviation (MAD)0.02300613497
Skewness1.706026823
Sum362.9294479
Variance0.004764906825
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.015337423312003.8%
 
0.021472392642003.8%
 
0.012269938651943.6%
 
0.018404907981933.6%
 
0.0092024539881723.2%
 
0.02453987731583.0%
 
0.013803680981502.8%
 
0.019938650311492.8%
 
0.023006134971482.8%
 
0.016871165641482.8%
 
Other values (306)360667.8%
 
ValueCountFrequency (%) 
01< 0.1%
 
0.00153374233170.1%
 
0.003067484663250.5%
 
0.004601226994360.7%
 
0.0053680981630.1%
 
ValueCountFrequency (%) 
11< 0.1%
 
0.47546012271< 0.1%
 
0.39033742331< 0.1%
 
0.35122699391< 0.1%
 
0.33742331291< 0.1%
 

chlorides
Real number (ℝ≥0)

Distinct count214
Unique (%)4.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.07923663006225956
Minimum0.0
Maximum1.0
Zeros1
Zeros (%)< 0.1%
Memory size41.5 KiB

Quantile statistics

Minimum0
5-th percentile0.03156146179
Q10.04817275748
median0.06312292359
Q30.09468438538
95-th percentile0.157807309
Maximum1
Range1
Interquartile range (IQR)0.04651162791

Descriptive statistics

Standard deviation0.06123721418
Coefficient of variation (CV)0.7728397098
Kurtosis48.26517633
Mean0.07923663006
Median Absolute Deviation (MAD)0.01827242525
Skewness5.339077121
Sum421.3803987
Variance0.0037499964
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.044850498341653.1%
 
0.058139534881603.0%
 
0.054817275751583.0%
 
0.061461794021562.9%
 
0.051495016611522.9%
 
0.063122923591482.8%
 
0.064784053161432.7%
 
0.048172757481422.7%
 
0.068106312291412.7%
 
0.04152823921382.6%
 
Other values (204)381571.7%
 
ValueCountFrequency (%) 
01< 0.1%
 
0.0049833887041< 0.1%
 
0.0066445182721< 0.1%
 
0.00830564784140.1%
 
0.00996677740930.1%
 
ValueCountFrequency (%) 
11< 0.1%
 
0.99833887041< 0.1%
 
0.76079734221< 0.1%
 
0.75581395351< 0.1%
 
0.68604651161< 0.1%
 

free sulfur dioxide
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count135
Unique (%)2.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.10080021729137938
Minimum0.0
Maximum1.0
Zeros2
Zeros (%)< 0.1%
Memory size41.5 KiB

Quantile statistics

Minimum0
5-th percentile0.01736111111
Q10.05208333333
median0.09375
Q30.1388888889
95-th percentile0.2083333333
Maximum1
Range1
Interquartile range (IQR)0.08680555556

Descriptive statistics

Standard deviation0.06182071096
Coefficient of variation (CV)0.6132993819
Kurtosis9.528239672
Mean0.1008002173
Median Absolute Deviation (MAD)0.04166666667
Skewness1.363771546
Sum536.0555556
Variance0.003821800304
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.017361111111502.8%
 
0.097222222221442.7%
 
0.086805555561342.5%
 
0.048611111111322.5%
 
0.079861111111282.4%
 
0.055555555561242.3%
 
0.10416666671242.3%
 
0.11458333331242.3%
 
0.076388888891212.3%
 
0.093751152.2%
 
Other values (125)402275.6%
 
ValueCountFrequency (%) 
02< 0.1%
 
0.0034722222222< 0.1%
 
0.006944444444500.9%
 
0.01041666667430.8%
 
0.013888888891112.1%
 
ValueCountFrequency (%) 
11< 0.1%
 
0.50520833331< 0.1%
 
0.47743055561< 0.1%
 
0.45138888891< 0.1%
 
0.44097222221< 0.1%
 

total sulfur dioxide
Real number (ℝ≥0)

Distinct count276
Unique (%)5.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.249096191874219
Minimum0.0
Maximum1.0
Zeros2
Zeros (%)< 0.1%
Memory size41.5 KiB

Quantile statistics

Minimum0
5-th percentile0.02995391705
Q10.1566820276
median0.2534562212
Q30.340437788
95-th percentile0.4608294931
Maximum1
Range1
Interquartile range (IQR)0.1837557604

Descriptive statistics

Standard deviation0.1308383586
Coefficient of variation (CV)0.5252523438
Kurtosis-0.300810682
Mean0.2490961919
Median Absolute Deviation (MAD)0.08986175115
Skewness0.06366693644
Sum1324.693548
Variance0.01711867609
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.2419354839541.0%
 
0.2465437788500.9%
 
0.2488479263490.9%
 
0.2119815668480.9%
 
0.267281106480.9%
 
0.2811059908460.9%
 
0.2557603687440.8%
 
0.2188940092430.8%
 
0.331797235430.8%
 
0.2258064516430.8%
 
Other values (266)485091.2%
 
ValueCountFrequency (%) 
02< 0.1%
 
0.00230414746540.1%
 
0.004608294931110.2%
 
0.006912442396140.3%
 
0.009216589862240.5%
 
ValueCountFrequency (%) 
11< 0.1%
 
0.83064516131< 0.1%
 
0.77880184331< 0.1%
 
0.70737327191< 0.1%
 
0.69470046081< 0.1%
 

density
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count998
Unique (%)18.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.14317029011908292
Minimum0.0
Maximum1.0
Zeros1
Zeros (%)< 0.1%
Memory size41.5 KiB

Quantile statistics

Minimum0
5-th percentile0.05279545016
Q10.09812994024
median0.1455561982
Q30.1862348178
95-th percentile0.232369385
Maximum1
Range1
Interquartile range (IQR)0.08810487758

Descriptive statistics

Standard deviation0.05717257262
Coefficient of variation (CV)0.3993326588
Kurtosis8.713920947
Mean0.1431702901
Median Absolute Deviation (MAD)0.04357046462
Skewness0.6660333553
Sum761.3796029
Variance0.00326870306
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.09427414691601.1%
 
0.1945247735591.1%
 
0.2099479468531.0%
 
0.1096973202531.0%
 
0.2022363601521.0%
 
0.1868131868511.0%
 
0.1174089069500.9%
 
0.1212647002500.9%
 
0.1752458068490.9%
 
0.1829573935480.9%
 
Other values (988)479390.1%
 
ValueCountFrequency (%) 
01< 0.1%
 
0.00038557933291< 0.1%
 
0.0021206863311< 0.1%
 
0.0055909003281< 0.1%
 
0.0059764796612< 0.1%
 
ValueCountFrequency (%) 
11< 0.1%
 
0.44707923661< 0.1%
 
0.3196452671< 0.1%
 
0.31019857341< 0.1%
 
0.3092346252< 0.1%
 

pH
Real number (ℝ≥0)

Distinct count108
Unique (%)2.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.39119299381069406
Minimum0.0
Maximum1.0
Zeros1
Zeros (%)< 0.1%
Memory size41.5 KiB

Quantile statistics

Minimum0
5-th percentile0.2015503876
Q10.3023255814
median0.3798449612
Q30.4728682171
95-th percentile0.6046511628
Maximum1
Range1
Interquartile range (IQR)0.1705426357

Descriptive statistics

Standard deviation0.124343644
Coefficient of variation (CV)0.3178575433
Kurtosis0.4312551428
Mean0.3911929938
Median Absolute Deviation (MAD)0.08527131783
Skewness0.3903583769
Sum2080.364341
Variance0.0154613418
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.34108527131562.9%
 
0.38759689921542.9%
 
0.32558139531462.7%
 
0.33333333331442.7%
 
0.37209302331422.7%
 
0.40310077521412.7%
 
0.35658914731392.6%
 
0.36434108531362.6%
 
0.31007751941302.4%
 
0.34883720931272.4%
 
Other values (98)390373.4%
 
ValueCountFrequency (%) 
01< 0.1%
 
0.015503875972< 0.1%
 
0.038759689921< 0.1%
 
0.054263565892< 0.1%
 
0.0620155038830.1%
 
ValueCountFrequency (%) 
12< 0.1%
 
0.91472868222< 0.1%
 
0.87596899221< 0.1%
 
0.85271317831< 0.1%
 
0.84496124031< 0.1%
 

sulphates
Real number (ℝ≥0)

Distinct count111
Unique (%)2.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.17606834536934135
Minimum0.0
Maximum1.0
Zeros1
Zeros (%)< 0.1%
Memory size41.5 KiB

Quantile statistics

Minimum0
5-th percentile0.07303370787
Q10.1179775281
median0.1629213483
Q30.2134831461
95-th percentile0.3210674157
Maximum1
Range1
Interquartile range (IQR)0.09550561798

Descriptive statistics

Standard deviation0.08413073285
Coefficient of variation (CV)0.4778299738
Kurtosis8.611916138
Mean0.1760683454
Median Absolute Deviation (MAD)0.04494382022
Skewness1.809098753
Sum936.3314607
Variance0.00707798021
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.15730337082124.0%
 
0.13483146071963.7%
 
0.17977528091943.6%
 
0.12359550561823.4%
 
0.14606741571703.2%
 
0.089887640451653.1%
 
0.16853932581623.0%
 
0.14044943821613.0%
 
0.15168539331593.0%
 
0.12921348311573.0%
 
Other values (101)356066.9%
 
ValueCountFrequency (%) 
01< 0.1%
 
0.0056179775281< 0.1%
 
0.0168539325840.1%
 
0.0224719101130.1%
 
0.02808988764100.2%
 
ValueCountFrequency (%) 
11< 0.1%
 
0.98876404491< 0.1%
 
0.97191011241< 0.1%
 
0.78651685391< 0.1%
 
0.78089887641< 0.1%
 

alcohol
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count111
Unique (%)2.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.3694524838621181
Minimum0.0
Maximum1.0
Zeros2
Zeros (%)< 0.1%
Memory size41.5 KiB

Quantile statistics

Minimum0
5-th percentile0.1449275362
Q10.2173913043
median0.347826087
Q30.4927536232
95-th percentile0.6811594203
Maximum1
Range1
Interquartile range (IQR)0.2753623188

Descriptive statistics

Standard deviation0.171878793
Coefficient of variation (CV)0.465225707
Kurtosis-0.5377448203
Mean0.3694524839
Median Absolute Deviation (MAD)0.1304347826
Skewness0.5458655791
Sum1964.748309
Variance0.02954231949
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.21739130432875.4%
 
0.20289855072604.9%
 
0.17391304352053.9%
 
0.28985507252043.8%
 
0.36231884061933.6%
 
0.43478260871763.3%
 
0.26086956521733.3%
 
0.18840579711693.2%
 
0.3478260871663.1%
 
0.31884057971583.0%
 
Other values (101)332762.6%
 
ValueCountFrequency (%) 
02< 0.1%
 
0.0579710144940.1%
 
0.07246376812100.2%
 
0.08695652174160.3%
 
0.1014492754480.9%
 
ValueCountFrequency (%) 
11< 0.1%
 
0.89855072461< 0.1%
 
0.87681159421< 0.1%
 
0.8695652174110.2%
 
0.855072463830.1%
 

quality
Real number (ℝ≥0)

Distinct count7
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.795599849567506
Minimum3
Maximum9
Zeros0
Zeros (%)0.0%
Memory size41.5 KiB

Quantile statistics

Minimum3
5-th percentile5
Q15
median6
Q36
95-th percentile7
Maximum9
Range6
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.879714873
Coefficient of variation (CV)0.1517901332
Kurtosis0.2994655766
Mean5.79559985
Median Absolute Deviation (MAD)1
Skewness0.1474702912
Sum30821
Variance0.7738982578
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
6232343.7%
 
5175132.9%
 
785516.1%
 
42063.9%
 
81482.8%
 
3300.6%
 
950.1%
 
ValueCountFrequency (%) 
3300.6%
 
42063.9%
 
5175132.9%
 
6232343.7%
 
785516.1%
 
ValueCountFrequency (%) 
950.1%
 
81482.8%
 
785516.1%
 
6232343.7%
 
5175132.9%
 

std_fixed_acidity
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count106
Unique (%)2.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.9614535517852836
Minimum1.33500106673234
Maximum2.766319109226186
Zeros0
Zeros (%)0.0%
Memory size41.5 KiB

Quantile statistics

Minimum1.335001067
5-th percentile1.722766598
Q11.85629799
median1.945910149
Q32.041220329
95-th percentile2.282382386
Maximum2.766319109
Range1.431318042
Interquartile range (IQR)0.1849223385

Descriptive statistics

Standard deviation0.1677209065
Coefficient of variation (CV)0.0855084773
Kurtosis1.668477037
Mean1.961453552
Median Absolute Deviation (MAD)0.08961215869
Skewness0.8427240206
Sum10431.00999
Variance0.02813030248
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1.9169226122795.2%
 
1.8870696492695.1%
 
1.856297992464.6%
 
1.9315214122254.2%
 
1.9459101492234.2%
 
1.9021075262114.0%
 
1.9740810262013.8%
 
1.9600947842003.8%
 
1.8718021771973.7%
 
1.8245492921773.3%
 
Other values (96)309058.1%
 
ValueCountFrequency (%) 
1.3350010671< 0.1%
 
1.3609765531< 0.1%
 
1.4350845252< 0.1%
 
1.48160454130.1%
 
1.5040773971< 0.1%
 
ValueCountFrequency (%) 
2.7663191091< 0.1%
 
2.7472709142< 0.1%
 
2.7408400241< 0.1%
 
2.7080502011< 0.1%
 
2.6602595371< 0.1%
 

std_volatile_acidity
Real number (ℝ)

HIGH CORRELATION

Distinct count187
Unique (%)3.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-1.1686942976407284
Minimum-2.5257286443082556
Maximum0.4574248470388755
Zeros2
Zeros (%)< 0.1%
Memory size41.5 KiB

Quantile statistics

Minimum-2.525728644
5-th percentile-1.832581464
Q1-1.46967597
median-1.203972804
Q3-0.8915981193
95-th percentile-0.3856624808
Maximum0.457424847
Range2.983153491
Interquartile range (IQR)0.5780778508

Descriptive statistics

Standard deviation0.4423802788
Coefficient of variation (CV)-0.3785252309
Kurtosis-0.2011904145
Mean-1.168694298
Median Absolute Deviation (MAD)0.2876820725
Skewness0.3299671103
Sum-6215.116275
Variance0.1957003111
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
-1.2729656762314.3%
 
-1.3470736482194.1%
 
-1.4271163562194.1%
 
-1.309333321883.5%
 
-1.3862943611863.5%
 
-1.5141277331833.4%
 
-1.469675971783.3%
 
-1.6094379121783.3%
 
-1.2039728041683.2%
 
-1.1394342831643.1%
 
Other values (177)340464.0%
 
ValueCountFrequency (%) 
-2.5257286442< 0.1%
 
-2.4651040221< 0.1%
 
-2.4079456091< 0.1%
 
-2.30258509360.1%
 
-2.25379492940.1%
 
ValueCountFrequency (%) 
0.4574248471< 0.1%
 
0.28517894222< 0.1%
 
0.21511137961< 0.1%
 
0.16974277461< 0.1%
 
0.16551443851< 0.1%
 

std_citric_acid
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct count89
Unique (%)1.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.5421781970843025
Minimum0.0
Maximum1.2884098726725126
Zeros136
Zeros (%)2.6%
Memory size41.5 KiB

Quantile statistics

Minimum0
5-th percentile0.2
Q10.4898979486
median0.5567764363
Q30.632455532
95-th percentile0.7483314774
Maximum1.288409873
Range1.288409873
Interquartile range (IQR)0.1425575835

Descriptive statistics

Standard deviation0.1567463665
Coefficient of variation (CV)0.2891048872
Kurtosis2.926399681
Mean0.5421781971
Median Absolute Deviation (MAD)0.06687848773
Skewness-1.221647338
Sum2883.303652
Variance0.02456942341
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.54772255752645.0%
 
0.56568542492404.5%
 
0.52915026222354.4%
 
0.72324.4%
 
0.50990195142033.8%
 
0.58309518952033.8%
 
0.53851648071983.7%
 
0.55677643631883.5%
 
0.48989794861843.5%
 
0.51961524231793.4%
 
Other values (79)319260.0%
 
ValueCountFrequency (%) 
01362.6%
 
0.1310.6%
 
0.1414213562440.8%
 
0.1732050808260.5%
 
0.2340.6%
 
ValueCountFrequency (%) 
1.2884098731< 0.1%
 
1.1090536511< 0.1%
 
160.1%
 
0.99498743711< 0.1%
 
0.95393920141< 0.1%
 

std_residual_sugar
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct count316
Unique (%)5.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.2604991241345513
Minimum-0.5108256237659907
Maximum4.186619838331271
Zeros77
Zeros (%)1.4%
Memory size41.5 KiB

Quantile statistics

Minimum-0.5108256238
5-th percentile0.0953101798
Q10.5877866649
median0.993251773
Q32.014903021
95-th percentile2.667228207
Maximum4.186619838
Range4.697445462
Interquartile range (IQR)1.427116356

Descriptive statistics

Standard deviation0.8404588387
Coefficient of variation (CV)0.6667666979
Kurtosis-1.133897628
Mean1.260499124
Median Absolute Deviation (MAD)0.6567795364
Skewness0.3258942942
Sum6703.334342
Variance0.7063710595
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.47000362922003.8%
 
0.69314718062003.8%
 
0.33647223661943.6%
 
0.58778666491933.6%
 
0.18232155681723.2%
 
0.78845736041583.0%
 
0.40546510811502.8%
 
0.64185388621492.8%
 
0.53062825111482.8%
 
0.74193734471482.8%
 
Other values (306)360667.8%
 
ValueCountFrequency (%) 
-0.51082562381< 0.1%
 
-0.356674943970.1%
 
-0.2231435513250.5%
 
-0.1053605157360.7%
 
-0.0512932943930.1%
 
ValueCountFrequency (%) 
4.1866198381< 0.1%
 
3.4531571211< 0.1%
 
3.2600177681< 0.1%
 
3.1570004211< 0.1%
 
3.1179499061< 0.1%
 

std_chlorides
Real number (ℝ)

Distinct count214
Unique (%)4.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-2.98535150305032
Minimum-4.710530701645918
Maximum-0.49265831981054176
Zeros0
Zeros (%)0.0%
Memory size41.5 KiB

Quantile statistics

Minimum-4.710530702
5-th percentile-3.575550769
Q1-3.270169119
median-3.057607677
Q3-2.718100537
95-th percentile-2.26336438
Maximum-0.4926583198
Range4.217872382
Interquartile range (IQR)0.5520685823

Descriptive statistics

Standard deviation0.4420033191
Coefficient of variation (CV)-0.1480573791
Kurtosis2.300369307
Mean-2.985351503
Median Absolute Deviation (MAD)0.2607262625
Skewness0.913695014
Sum-15876.09929
Variance0.1953669341
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
-3.3242363411653.1%
 
-3.1235656451603.0%
 
-3.1700856611583.0%
 
-3.0791138821562.9%
 
-3.2188758251522.9%
 
-3.0576076771482.8%
 
-3.0365542681432.7%
 
-3.2701691191422.7%
 
-2.9957322741412.7%
 
-3.3813947541382.6%
 
Other values (204)381571.7%
 
ValueCountFrequency (%) 
-4.7105307021< 0.1%
 
-4.4228486291< 0.1%
 
-4.3428059221< 0.1%
 
-4.26869794940.1%
 
-4.19970507830.1%
 
ValueCountFrequency (%) 
-0.49265831981< 0.1%
 
-0.49429632181< 0.1%
 
-0.76142602131< 0.1%
 
-0.76787072681< 0.1%
 
-0.86274996491< 0.1%
 

std_free_sulfur_dioxide
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count135
Unique (%)2.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.195709316672157
Minimum0.0
Maximum5.666426688112432
Zeros2
Zeros (%)< 0.1%
Memory size41.5 KiB

Quantile statistics

Minimum0
5-th percentile1.791759469
Q12.772588722
median3.33220451
Q33.713572067
95-th percentile4.110873864
Maximum5.666426688
Range5.666426688
Interquartile range (IQR)0.9409833445

Descriptive statistics

Standard deviation0.7030299733
Coefficient of variation (CV)0.2199918402
Kurtosis0.3238164863
Mean3.195709317
Median Absolute Deviation (MAD)0.4418327523
Skewness-0.7889486262
Sum16994.78215
Variance0.4942511434
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1.7917594691502.8%
 
3.367295831442.7%
 
3.2580965381342.5%
 
2.7080502011322.5%
 
3.178053831282.4%
 
3.4339872041242.3%
 
3.5263605251242.3%
 
2.8332133441242.3%
 
3.1354942161212.3%
 
3.332204511152.2%
 
Other values (125)402275.6%
 
ValueCountFrequency (%) 
02< 0.1%
 
0.69314718062< 0.1%
 
1.098612289500.9%
 
1.386294361430.8%
 
1.6094379121112.1%
 
ValueCountFrequency (%) 
5.6664266881< 0.1%
 
4.9870254281< 0.1%
 
4.9308703261< 0.1%
 
4.8751973231< 0.1%
 
4.8520302641< 0.1%
 

std_alcohol
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count111
Unique (%)2.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.3499030865611767
Minimum2.0794415416798357
Maximum2.7013612129514133
Zeros0
Zeros (%)0.0%
Memory size41.5 KiB

Quantile statistics

Minimum2.079441542
5-th percentile2.197224577
Q12.251291799
median2.341805806
Q32.433613355
95-th percentile2.541601993
Maximum2.701361213
Range0.6219196713
Interquartile range (IQR)0.1823215568

Descriptive statistics

Standard deviation0.1102245142
Coefficient of variation (CV)0.04690598297
Kurtosis-0.7701589841
Mean2.349903087
Median Absolute Deviation (MAD)0.09051400754
Skewness0.3604808509
Sum12496.78461
Variance0.01214944352
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
2.2512917992875.4%
 
2.2407096892604.9%
 
2.2192034842053.9%
 
2.3025850932043.8%
 
2.3513752571933.6%
 
2.3978952731763.3%
 
2.2823823861733.3%
 
2.23001441693.2%
 
2.3418058061663.1%
 
2.322387721583.0%
 
Other values (101)332762.6%
 
ValueCountFrequency (%) 
2.0794415422< 0.1%
 
2.12823170640.1%
 
2.140066163100.2%
 
2.151762203160.3%
 
2.163323026480.9%
 
ValueCountFrequency (%) 
2.7013612131< 0.1%
 
2.6532419651< 0.1%
 
2.6426223961< 0.1%
 
2.63905733110.2%
 
2.6318888430.1%
 

std_density
Real number (ℝ)

HIGH CORRELATION

Distinct count998
Unique (%)18.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-0.005483177136146001
Minimum-0.012973796923762338
Maximum0.03823946265366855
Zeros23
Zeros (%)0.4%
Memory size41.5 KiB

Quantile statistics

Minimum-0.01297379692
5-th percentile-0.01020337787
Q1-0.007830579115
median-0.005354308762
Q3-0.00323522771
95-th percentile-0.0008373505056
Maximum0.03823946265
Range0.05121325958
Interquartile range (IQR)0.004595351405

Descriptive statistics

Standard deviation0.002979006528
Coefficient of variation (CV)-0.5432993416
Kurtosis7.954555702
Mean-0.005483177136
Median Absolute Deviation (MAD)0.002274718403
Skewness0.6220277549
Sum-29.15953601
Variance8.874479892e-06
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
-0.008032171697601.1%
 
-0.002803927333591.1%
 
-0.002002002671531.0%
 
-0.007226045092531.0%
 
-0.002402884616521.0%
 
-0.003205130949511.0%
 
-0.006621876309500.9%
 
-0.006823225348500.9%
 
-0.003807238343490.9%
 
-0.003405793135480.9%
 
Other values (988)479390.1%
 
ValueCountFrequency (%) 
-0.012973796921< 0.1%
 
-0.012953535961< 0.1%
 
-0.012862366721< 0.1%
 
-0.012680053161< 0.1%
 
-0.012659798152< 0.1%
 
ValueCountFrequency (%) 
0.038239462651< 0.1%
 
0.010247316451< 0.1%
 
0.0036832086521< 0.1%
 
0.0031948908971< 0.1%
 
0.0031450491442< 0.1%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

Sample

First rows

df_indexfixed acidityvolatile aciditycitric acidresidual sugarchloridesfree sulfur dioxidetotal sulfur dioxidedensitypHsulphatesalcoholqualitystd_fixed_aciditystd_volatile_aciditystd_citric_acidstd_residual_sugarstd_chloridesstd_free_sulfur_dioxidestd_alcoholstd_density
000.2975210.4133330.0000000.0199390.1112960.0347220.0645160.2060920.6124030.1910110.20289952.001480-0.3566750.0000000.641854-2.5770222.3978952.240710-0.002202
110.3305790.5333330.0000000.0306750.1478410.0833330.1405530.1868130.3720930.2584270.26087052.054124-0.1278330.0000000.955511-2.3227883.2188762.282382-0.003205
220.3305790.4533330.0240960.0260740.1378740.0486110.1105990.1906690.4186050.2415730.26087052.054124-0.2744370.2000000.832909-2.3859672.7080502.282382-0.003005
330.6115700.1333330.3373490.0199390.1096350.0555560.1244240.2099480.3410850.2022470.26087062.415914-1.2729660.7483310.641854-2.5902672.8332132.282382-0.002002
450.2975210.3866670.0000000.0184050.1096350.0416670.0783410.2060920.6124030.1910110.20289952.001480-0.4155150.0000000.587787-2.5902672.5649492.240710-0.002202
560.3388430.3466670.0361450.0153370.0996680.0486110.1221200.1791020.4496120.1348310.20289952.066863-0.5108260.2449490.470004-2.6736492.7080502.240710-0.003606
670.2892560.3800000.0000000.0092020.0930230.0486110.0345620.1443990.5193800.1404490.28985571.987874-0.4307830.0000000.182322-2.7333682.7080502.302585-0.005415
780.3305790.3333330.0120480.0214720.1063120.0277780.0276500.1868130.4961240.1966290.21739172.054124-0.5447270.1414210.693147-2.6172962.1972252.251292-0.003205
890.3057850.2800000.2168670.0843560.1029900.0555560.2211980.2060920.4883720.3258430.36231952.014903-0.6931470.6000001.808289-2.6450752.8332132.351375-0.002202
9100.2396690.3333330.0481930.0184050.1461790.0486110.1359450.1694620.4341090.1797750.17391351.902108-0.5447270.2828430.587787-2.3330442.7080502.219203-0.004108

Last rows

df_indexfixed acidityvolatile aciditycitric acidresidual sugarchloridesfree sulfur dioxidetotal sulfur dioxidedensitypHsulphatesalcoholqualitystd_fixed_aciditystd_volatile_aciditystd_citric_acidstd_residual_sugarstd_chloridesstd_free_sulfur_dioxidestd_alcoholstd_density
530864870.2479340.0933330.2168670.0092020.0714290.1284720.2788020.1193370.2480620.1797750.17391351.916923-1.5141280.6000000.182322-2.9565123.6375862.219203-0.006723
530964880.0909090.1033330.1626510.1710120.0348840.1145830.2580650.1598230.2713180.1573030.20289961.589235-1.4481700.5196152.463853-3.5065583.5263612.240710-0.004611
531064890.1900830.1733330.1746990.0245400.0448500.0833330.2165900.0437630.2635660.1235960.55072561.808289-1.0788100.5385160.788457-3.3242363.2188762.468100-0.010677
531164900.1570250.0866670.1927710.0046010.0481730.1284720.2649770.0699830.4031010.1348310.37681261.740466-1.5606480.565685-0.105361-3.2701693.6375862.360854-0.009303
531264910.2231400.1000000.2289160.0107360.0382060.0972220.2442400.1131680.4418600.1797750.24637751.871802-1.4696760.6164410.262364-3.4420193.3672962.272126-0.007045
531364920.1983470.0866670.1746990.0153370.0498340.0798610.1981570.0776940.4263570.1573030.46376861.824549-1.5606480.5385160.470004-3.2441943.1780542.415914-0.008899
531464930.2314050.1600000.2168670.1134970.0631230.1944440.3732720.1501830.3333330.1348310.23188451.887070-1.1394340.6000002.079442-3.0576084.0430512.261763-0.005113
531564940.2231400.1066670.1144580.0092020.0531560.1006940.2419350.1046850.2093020.1348310.20289961.871802-1.4271160.4358900.182322-3.1941833.4011972.240710-0.007488
531664950.1404960.1400000.1807230.0076690.0215950.0659720.2396310.0304610.4806200.0898880.69565271.704748-1.2378740.5477230.095310-3.8167132.9957322.549445-0.011374
531764960.1818180.0866670.2289160.0030670.0182720.0729170.2119820.0443420.4186050.0561800.55072561.791759-1.5606480.616441-0.223144-3.9120233.0910422.468100-0.010646